Advancements of Outlier Detection: A Survey

نویسنده

  • Ji Zhang
چکیده

Outlier detection is an important research problem in data mining that aims to find objects that are considerably dissimilar, exceptional and inconsistent with respect to the majority data in an input database [50]. Outlier detection, also known as anomaly detection in some literatures, has become the enabling underlying technology for a wide range of practical applications in industry, business, security and engineering, etc. For example, outlier detection can help identify suspicious fraudulent transaction for credit card companies. It can also be utilized to identify abnormal brain signals that may indicate the early development of brain cancers. Due to its inherent importance in various areas, considerable research efforts in outlier detection have been conducted in the past decade. A number of outlier detection techniques have been proposed that use different mechanisms and algorithms. This paper presents a comprehensive review on the major stateof-the-art outlier detection methods. We will cover different major categories of outlier detection approaches and critically evaluate their respective advantages and disadvantages. In principle, an outlier detection technique can be considered as a mapping function f that can be expressed as f(p) → q, where q ∈ R. Giving a data point p in the given dataset, a corresponding outlier-ness score is generated by applying the mapping function f to quantitatively reflect the strength of outlier-ness of p. Based on the mapping function f , there are typically two major tasks for outlier detection problem to accomplish, which leads to two corresponding problem formulations. From the given dataset that is under study, one may want to find the top k outliers that have the highest outlier-ness scores or all the outliers whose outlier-ness score exceeding a user specified threshold. The exact techniques or algorithms used in different outlier methods may vary significantly, which are largely dependent on the characteristic of the datasets to be dealt with. The datasets could be static with a small number of attributes where outlier detection is relatively easy. Nevertheless, the datasets could also be dynamic, such as data streams, and at the same time have a large number of attributes. Dealing with this kind of datasets is more complex by nature and requires special attentions to the detection performance (including speed and accuracy) of the methods to be developed. Given the abundance of research literatures in the field of outlier detection, the scope of this survey will be clearly specified first in order to facilitate a systematic survey of the existing outlier detection methods. After that, we will start the survey with a review of the conventional outlier detection techniques that are primarily suitable for relatively low-dimensional static data, followed by some of the major recent advancements in outlier detection for high-dimensional static data and data streams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Outlier Detection : A Survey

Outlier detection has been a very important concept in the realm of data analysis. Recently, several application domains have realized the direct mapping between outliers in data and real world anomalies, that are of great interest to an analyst. Outlier detection has been researched within various application domains and knowledge disciplines. This survey provides a comprehensive overview of e...

متن کامل

Outlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means

One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...

متن کامل

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Speeding problem detection in business surveys: benefits of statistical outlier detection methods

Speeding describes the unusually fast responses provided to survey questions. A characteristic of speeders is that answers by-pass cognitive process. Consequently, this low respondent engagement results in the poor quality and validity of data. The issue at hand is how to detect speeders in a survey. The presumption is the use of different statistical outlier detection methods. This paper prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • EAI Endorsed Trans. Scalable Information Systems

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2013